Audio-visual integration for robust speech recognition using maximum weighted stream posteriors

نویسندگان

  • Rowan Seymour
  • Darryl Stewart
  • Ji Ming
چکیده

In this paper, we demonstrate for the first time, the robustness of the Maximum Stream Posterior (MSP) method for audio-visual integration on a large speaker-independent speech recognition task in noisy conditions. Furthermore, we show that the method can be generalised and improved by using a softer weighting scheme to account for moderate noise conditions. We call this generalised method the Maximum Weighted Stream Posterior (MWSP) method. In addition, we carry out the first tests of the Posterior Union Model approach for audio-visual integration. All of the methods are compared in digit recognition tests involving various audio and video noise levels and conditions including tests where both modalities are affected by noise. We also introduce a novel form of noise called jitter which is used to simulate camera movement. The results verify that the MSP approach is robust and that its generalised form (MWSP) can lead to further improvements in moderate noise conditions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new posterior based audio-visual integration method for robust speech recognition

We describe the development of a multistream HMM based audio-visual speech recognition (AVSR) system and a new method for integrating the audio and visual streams using frame level posterior probabilities. This is compared to the standard feature concatenation and weighted product methods in speaker-dependent tests using our own multimodal database, by examining speech recognition robustness to...

متن کامل

Asynchronous stream modeling for large vocabulary audio-visual speech recognition

This paper addresses the problem of audio-visual information fusion to provide highly robust speech recognition. We investigate methods that make different assumptions about asynchrony and conditional dependence across streams and propose a technique based on composite HMMs that can account for stream asynchrony and different levels of information integration. We show how these models can be tr...

متن کامل

Continuous Audio-visual Speech Recognition Continuous Audio-visual Speech Recognition

We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audiovisual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We tackle the problem of joint temporal model...

متن کامل

Continuous Audio-Visual Speech Recognition

We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audio-visual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We tackle the problem of joint temporal mode...

متن کامل

IDIAP Martigny - Valais - Suisse Continuous Audio � Visual Speech Recognition

We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audiovisual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We t a c kle the problem of joint temporal mo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007